Add inference backend A/B benchmark by RitwijParmar · Pull Request #2777 · PrimeIntellect-ai/prime-rl

RitwijParmar · 2026-06-11T21:44:03Z

Summary

add an endpoint-level benchmark framework for OpenAI-compatible inference backends such as vLLM, routers, and Dynamo experiments
support both one-off comparisons and JSON scenario-suite runs
include a ready-to-run four-profile suite for short-rollout latency, long-context prefill, high-concurrency decode, and session-cache reuse
measure warmup-excluded request throughput, output throughput, streaming TTFT, p50/p95/p99 latency, error rate, and per-request failures
snapshot /metrics before and after each backend with a self-contained vLLM counter parser for token deltas, prefix-cache hit rate, and NIXL failure counters
add optional per-scenario regression gates so backend experiments can fail on throughput, latency, or error-rate regressions before being wired into full RL runs
write aggregate Markdown reports and JSON samples so backend comparisons can be reviewed and debugged after the run

Related to #1166. This does not duplicate the Dynamo backend implementation. It gives the project a repeatable way to compare Dynamo, vLLM, or router endpoints under rollout-like traffic before moving a backend into training.

Checks

uv run --no-sync ruff check benchmarks/scripts/inference_backend_benchmark.py tests/unit/test_inference_backend_benchmark.py
uv run --no-sync python -m py_compile benchmarks/scripts/inference_backend_benchmark.py tests/unit/test_inference_backend_benchmark.py
PYTHONPATH=src uv run --no-sync pytest --confcutdir=tests/unit tests/unit/test_inference_backend_benchmark.py -q
git diff --check

Note: regular uv run sync is blocked locally on macOS because this checkout's lockfile only supports Linux platforms. The repo-level pytest conftest also pulls in heavier runtime setup that is unrelated to this pure benchmark module, so I ran the focused tests with --confcutdir=tests/unit after populating submodules and installing the minimal local test tooling.

RitwijParmar · 2026-06-11T22:04:32Z

I opened this as a draft because the next useful step is a real vLLM vs Dynamo run.

If there is a preferred Dynamo branch or launch command, I can run the suite against it and add the result artifact here.

The suite covers short rollout latency, long-context prefill, high-concurrency decode, and session-cache reuse.

RitwijParmar force-pushed the feat/dynamo-rollout-benchmark branch from 9100ad5 to 58d62e4 Compare June 11, 2026 21:45

feat(benchmarks): compare inference backends

0beec65

RitwijParmar force-pushed the feat/dynamo-rollout-benchmark branch from 58d62e4 to 0beec65 Compare June 11, 2026 21:57

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add inference backend A/B benchmark#2777

Add inference backend A/B benchmark#2777
RitwijParmar wants to merge 1 commit into
PrimeIntellect-ai:mainfrom
RitwijParmar:feat/dynamo-rollout-benchmark

RitwijParmar commented Jun 11, 2026 •

edited

Loading

Uh oh!

RitwijParmar commented Jun 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

RitwijParmar commented Jun 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Checks

Uh oh!

RitwijParmar commented Jun 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

RitwijParmar commented Jun 11, 2026 •

edited

Loading